{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Custom Model Compilation and Inference using TVM-NEO-DLR\n",
    "In this example notebook, we describe how to take a pre-trained classification model and compile it using ***TVM*** compiler to generate deployable artifacts that can be deployed on the target using the ***NEO-AI-DLR*** interface. \n",
    " - Pre-trained model: `resnet18` model trained on ***ImageNet*** dataset. \n",
    "\n",
    "In particular, we will show how to\n",
    "- compile the model (during heterogenous model compilation, layers that are supported will be offloaded to the`TI-DSP`)\n",
    "- use the generated artifacts for inference\n",
    "- perform input preprocessing and output postprocessing.\n",
    "- enable debug logs\n",
    "- use deny-layer compilation option to isolate possible problematic layers and create additional model subgraphs\n",
    "- use the generated subgraphs artifacts for inference\n",
    "- perform input preprocessing and output postprocessing\n",
    "     \n",
    "## Neo-AI-DLR based workflow\n",
    "The diagram below describes the steps for TVM/NEO-AI-DLR based workflow. \n",
    "\n",
    "Note: \n",
    "- The user needs to compile models(sub-graph creation and quantization) on a PC to generate model artifacts.\n",
    "- The generated artifacts can then be used to run inference on the target.\n",
    "\n",
    "<img src=docs/images/tvmrt_work_flow_2.png width=\"400\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "import onnx\n",
    "import cv2\n",
    "import shutil \n",
    "from tvm import relay\n",
    "from tvm.relay.backend.contrib import tidl\n",
    "from dlr import DLRModel\n",
    "from scripts.utils import imagenet_class_to_name, download_model\n",
    "from pathlib import Path\n",
    "from IPython.display import Markdown as md\n",
    "from scripts.utils import loggerWritter\n",
    "from scripts.utils import get_svg_path"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load the model in its native framework\n",
    "The `resnet18v2` model used here is trained using the ***ImageNet*** dataset saved in `/model-zoo`. \n",
    "- Note: An ***ONNX*** model has several inputs nodes, which include the weights and biases for the compute layers, as well as the input to the model. Below, we are printing the details of the input node that correspond to the model input. From the printed output, we will gather the `name` and the `shape` of the model input."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "onnx_model_path = 'models/public/onnx/resnet18_opset9.onnx'\n",
    "download_model(onnx_model_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "onnx_model = onnx.load(onnx_model_path)\n",
    "print(len(onnx_model.graph.input))\n",
    "onnx_model.graph.input[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# we use the output from the cell above to populate these variables\n",
    "input_name = 'input.1'\n",
    "input_shape = (1, 3, 224, 224)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Convert the model to `Relay IR` format"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "mod, params = relay.frontend.from_onnx(onnx_model, shape={input_name : input_shape})\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define utility function to preprocess input images\n",
    "\n",
    "Below, we define a utility function to preprocess images for `resnet18v2`. This function takes a path as input, loads the image and preprocesses it for generic ***ONNX*** inference. The steps are as follows: \n",
    "\n",
    " 1. load image\n",
    " 2. convert BGR image to RGB\n",
    " 3. scale image so that the short edge is 256 pixels\n",
    " 4. center-crop image to 224x224 pixels\n",
    " 5. apply per-channel pixel scaling and mean subtraction\n",
    " 6. convert the image to NCHW format\n",
    "\n",
    "\n",
    "- Note: If you are using a custom model or a model that was trained using a different framework, please remember to define your own utility function. For example, if you are using a model trained using ***Tensorflow***, you might need to use a different set of `mean` and `scale` values for *step 5* above and you might need to  modify *step 6* to convert the image to `NHWC` format.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def preprocess(image_path):\n",
    "    \n",
    "    # read the image using openCV\n",
    "    img = cv2.imread(image_path)\n",
    "    \n",
    "    # convert to RGB\n",
    "    img = img[:,:,::-1]\n",
    "    \n",
    "    # Most of the onnx models are trained using\n",
    "    # 224x224 images. The general rule of thumb\n",
    "    # is to scale the input image while preserving\n",
    "    # the original aspect ratio so that the\n",
    "    # short edge is 256 pixels, and then\n",
    "    # center-crop the scaled image to 224x224\n",
    "    orig_height, orig_width, _ = img.shape\n",
    "    short_edge = min(img.shape[:2])\n",
    "    new_height = (orig_height * 256) // short_edge\n",
    "    new_width = (orig_width * 256) // short_edge\n",
    "    img = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_CUBIC)\n",
    "\n",
    "    startx = new_width//2 - (224//2)\n",
    "    starty = new_height//2 - (224//2)\n",
    "    img = img[starty:starty+224,startx:startx+224]\n",
    "    \n",
    "    # apply scaling and mean subtraction.\n",
    "    # if your model is built with an input\n",
    "    # normalization layer, then you might\n",
    "    # need to skip this\n",
    "    img = img.astype('float32')\n",
    "    for mean, scale, ch in zip([123.675, 116.28, 103.53], [0.017125, 0.017507, 0.017429], range(img.shape[2])):\n",
    "            img[:,:,ch] = ((img.astype('float32')[:,:,ch] - mean) * scale)\n",
    "     \n",
    "    # convert HWC to NCHW\n",
    "    img = np.expand_dims(np.transpose(img, (2,0,1)),axis=0)\n",
    "    \n",
    "    return img"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compile the model\n",
    "In this step, we convert the `Relay IR` module into deployable artifacts with layers offloaded to `TIDL`. The deployable artifacts and all intermediate files are stored in the `output_dir` defined below.   \n",
    "\n",
    "- Note: Since `TIDL` uses quantized models for inference, layer outputs must be calibrated by running dummy inferences and collecting quantization statistics. We do this by feeding 4 images from the validation subset of the ***ImageNet*** dataset with appropriate preprocessing. It is mandatory that inputs are preprocessed according to model requirements. \n",
    "     \n",
    "    The script below calls `TIDLCompiler` with the following arguments: \n",
    "    * **platform** = 'am68pa' to identify the device \n",
    "    * **version** = (7, 3) to identify the version \n",
    "    * **tidl_tools_path** = os.getenv('TIDL_TOOLS_PATH'), path to `TIDL` compilation tools \n",
    "    * **artifacts_folder** = output_dir, where all intermediate results are stored\n",
    "    * **tensor_bits** = 8, or 16, is the number of bits to be used for  quantization \n",
    "    * **max_num_subgraphs** = 16, the maximum number of `TIDL` subgraphs to split into \n",
    "    * **accuracy_level** = 0, for fastest compilation with acceptable drop in accuracy\n",
    "    * **c7x_codegen** = 0 \n",
    "    \n",
    "     \n",
    "- Note: The path to the `TIDL` compilation tools and `aarch64` `GCC` compiler is required for model compilation, both of which can be accessed by this notebook using predefined environment variables `TIDL_TOOLS_PATH` and `ARM64_GCC_PATH`. The example usage of both the variables is demonstrated in the cell below. \n",
    "     \n",
    "- Note: This model does not require `accuracy_level` greater than `0` and delivers great accuracy with simple quantization and calibration with 4 images. However, some models may require a higher number for `accuracy_level`, in which case, the following changes are recommended** \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "calib_input_list = []\n",
    "output_dir = 'custom-artifacts/tvm-dlr/resnet'\n",
    "\n",
    "#TRAGET Build\n",
    "build_target = 'llvm -device=arm_cpu -mtriple=aarch64-linux-gnu'\n",
    "cross_cc_args = {'cc' : os.path.join(os.environ['ARM64_GCC_PATH'], 'bin', 'aarch64-none-linux-gnu-gcc')}\n",
    "\n",
    "#PC Emulation BUILD\n",
    "#build_target = 'llvm'\n",
    "#cross_cc_args = {}\n",
    "\n",
    "# create the output dir if not preset\n",
    "# clear the directory\n",
    "os.makedirs(output_dir, exist_ok=True)\n",
    "for root, dirs, files in os.walk(output_dir, topdown=False):\n",
    "    [os.remove(os.path.join(root, f)) for f in files]\n",
    "    [os.rmdir(os.path.join(root, d)) for d in dirs]\n",
    "    \n",
    "# build the list of preprocessed images that will be used for calibration\n",
    "calib_images = [\n",
    "'sample-images/elephant.bmp',\n",
    "'sample-images/bus.bmp',\n",
    "'sample-images/bicycle.bmp',\n",
    "'sample-images/zebra.bmp',\n",
    "]\n",
    "for filename in calib_images:\n",
    "    calib_input_list.append({input_name : preprocess(filename)})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compilation knobs  (optional - In case of debugging accuracy)\n",
    "if a model accuracy at 8bits is not good, user's can try compiling same model at 16 bits with accuracy level of 1. This will reduce the performance, but it will give users a good accuracy bar.\n",
    "As a second step, user can try to increase 8 bits accuracy by increasing the number of calibration frames and iterations, in order to get closer to 16 bits + accuracy level of 1 results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#compilation options - knobs to tweak \n",
    "num_bits =8\n",
    "accuracy =0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Layers debug (optional - In case of debugging)\n",
    "Debug_level 3 gives layer information and warnings/erros which could be useful during debug. User's can see logs from compilation inside a giving path to \"loggerWritter\" helper function.\n",
    "\n",
    "Another technique is to use deny_list to exclude layers from running on TIDL and create additional subgraphs, in order to aisolate issues"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "log_dir = Path(\"logs\").mkdir(parents=True, exist_ok=True)\n",
    "\n",
    "# stdout and stderr saved to a *.log file.  \n",
    "with loggerWritter(\"logs/custon-model-tvm-dlr\"):\n",
    "    # Create the TIDL compiler\n",
    "    tidl_compiler = tidl.TIDLCompiler(\n",
    "        os.environ['SOC'],\n",
    "        (7, 3),\n",
    "        tidl_tools_path = os.getenv('TIDL_TOOLS_PATH'),\n",
    "        artifacts_folder = output_dir,\n",
    "        tensor_bits = num_bits,\n",
    "        max_num_subgraphs = 16,\n",
    "        accuracy_level = accuracy,\n",
    "        advanced_options = { 'calibration_iterations' : 3},\n",
    "        debug_level = 3,\n",
    "        c7x_codegen = 0,\n",
    "        #deny_list = \"nn.batch_flatten\"  #Comma separated string of operator types as defined by TVM Relay ops. Ex: \"nn.batch_flatten\"\n",
    "    )\n",
    "# partition the graph into TIDL operations and TVM operations\n",
    "mod, status = tidl_compiler.enable(mod, params, calib_input_list)\n",
    "\n",
    "# build the relay module into deployables\n",
    "with tidl.build_config(tidl_compiler=tidl_compiler):\n",
    "    graph, lib, params = relay.build_module.build(mod, target=build_target, params=params)\n",
    "tidl.remove_tidl_params(params)\n",
    "\n",
    "# save the deployables\n",
    "path_lib = os.path.join(output_dir, 'deploy_lib.so')\n",
    "path_graph = os.path.join(output_dir, 'deploy_graph.json')\n",
    "path_params = os.path.join(output_dir, 'deploy_params.params')\n",
    "\n",
    "lib.export_library(path_lib, **cross_cc_args)\n",
    "with open(path_graph, \"w\") as fo:\n",
    "    fo.write(graph)\n",
    "with open(path_params, \"wb\") as fo:\n",
    "    fo.write(relay.save_param_dict(params))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "<b>Note:</b> Please note 'deny_list' is used in above cell as an example and it can be deleted as \"flatten\" is a supported layer\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Subgraphs visualization  (optional - In case of debugging models and subgraps)\n",
    "Running below cell gives links to complete graph and TIDL subgraphs visualizations. This, along with \"deny_list\" feature, explained above, offer tools for potencially checking and isolating issues in NN model layers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subgraph_link =get_svg_path(output_dir) \n",
    "for sg in subgraph_link:\n",
    "    hl_text = os.path.join(*Path(sg).parts[4:])\n",
    "    sg_rel = os.path.join('../', sg)\n",
    "    display(md(\"[{}]({})\".format(hl_text,sg_rel)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use compiled model for inference\n",
    "\n",
    "Then using the ***NEO-AI DLR*** interface we run the model and collect benchmark data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use deployed artifacts from the compiled model \n",
    "model = DLRModel(output_dir, 'cpu')\n",
    "\n",
    "# run inference\n",
    "#Running inference several times to get an stable performance output\n",
    "for i in range(5):\n",
    "    res = model.run({input_name : preprocess('sample-images/elephant.bmp')})\n",
    "\n",
    "from scripts.utils import imagenet_class_to_name\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# get the TOP-5 class IDs by argsort()\n",
    "# and use utility function to get names\n",
    "output = res[0].squeeze()\n",
    "classes = output.argsort()[-5:][::-1]\n",
    "print([imagenet_class_to_name(x)[0] for x in classes])\n",
    "\n",
    "# collect benchmark data \n",
    "from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output\n",
    "stats = model.get_TI_benchmark_data()\n",
    "fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))\n",
    "plot_TI_performance_data(stats, axis=ax)\n",
    "plt.show()\n",
    "\n",
    "tt, st, rb, wb = get_benchmark_output(stats)\n",
    "print(f'Statistics : \\n Inferences Per Second   : {1000.0/tt :7.2f} fps')\n",
    "print(f' Inference Time Per Image : {tt :7.2f} ms  \\n DDR BW Per Image        : {rb+ wb : 7.2f} MB')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## EVM's console logs (optional - in case of inference failure)\n",
    "\n",
    "To copy console logs from EVM to TI EdgeAI Cloud user's workspace, go to: \"Help -> Troubleshooting -> EVM console log\", In TI's EdgeAI Cloud landing page.\n",
    "\n",
    "Alternatevely, from workspace, open/run evm-console-log.ipynb"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
